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Abstract: Technological and methodological advances have been critical for the rapidly evolving field of proteomics. The 
development of fusion tag systems is essential for purification and analysis of recombinant proteins. The HaloTag is a 34 
KDa monomeric protein derived from a bacterial haloalkane dehalogenase. The majority of fusion tags in use today utilize 
a reversible binding interaction with a specific ligand. The HaloTag system is unique in that it forms a covalent linkage to 
its chloroalkane ligand. This linkage permits attachment of the HaloTag to a variety of functional reporters, which can be 
used to label and immobilize recombinant proteins. The success rate for HaloTag expression of soluble proteins is very 
high and comparable to maltose binding protein (MBP) tag. Furthermore, cleavage of the HaloTag does not result in pro- 
tein insolubility that often is observed with the MBP tag. In the present report, we describe applications of the HaloTag 
system in our ongoing investigation of protein-protein interactions of the Y. pestis Type 3 secretion system on a custom 
protein microarray. We also describe the utilization of affinity purification/mass spectroscopy (AP/MS) to evaluate the 
utility of the Halo Tag system to characterize DNA binding activity and protein specificity. 

Keywords: HaloTag, protein-protein interactions, protein-DNA interactions, expression, immobilization, Type 3 secretion fac- 
tors, E. coli RpoA. 



INTRODUCTION 

Advances in DNA sequencing technology have increased 
sharply over the past 15 years [1]. These advances have en- 
abled the sequencing of many large and small genomes, re- 
sulting in over 3,000 bacterial genomes including -150 ar- 
chaea and nearly 200 eukaryotic and mammalian genome 
sequences (http://www.ncbi.nlm.nih.gov/sites/genome) to be 
completed. The access to this massive quantity of data has 
had a strong ripple effect leading to an increased demand for 
new technologies that will enable scientists to study the ac- 
tivities and functions of these gene sequences in a high 
throughput manner. Among the numerous discoveries en- 
abled by genome sequence data, one somewhat unanticipated 
finding relates to the fact that at least one-third of the open 
reading frames (ORFs) encoded in genomes has no predicted 
function based on BLAST analysis [2-4]. Interestingly, the 
number of genes of unknown function increases in a linear 
manner as we sequence additional genomes [5]. One might 
imagine that as we sequence more genomes, the rate that 
novel genes are identified would begin to decrease rapidly. 
This is clearly not the case though and strongly support the 
view that the number of unique gene sequences and func- 
tions encoded on our planet is very large. For most microbial 
species, 10-30% or more of the ORFs encoded in one 
strain's genome are novel compared to another strain belong- 
ing to the same species. The gene pool of many bacterial 
species may exceed several tens of thousands of unique 
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genes. It is likely that by the end of this decade, we will have 
sequenced over 10 million genes of unknown function! 

This humbling realization emphasizes the need for sub- 
stantial improvements in the area of functional genomics if 
we are to keep pace with the ever-increasing ease that genes 
and genomes are sequenced. One phenomenon that have 
been documented, referred to as non-orthologous gene dis- 
placement (NODs) may provide an inroad to tackling the 
monumental problem of determining the function of unchar- 
acterized genes. NODs represent cases where two proteins 
perform the same cellular function but do not possess an 
ancestral relationship. We know of several cases like eu- 
karyotic and prokaryotic DNA polymerases that essentially 
carry out the same cellular functions, but do not share com- 
mon ancestral relationships. In other words these functions 
evolved independently during evolution. The vast majority 
of the assigned functions of genes are based on BLAST and 
orthology (conservation of DNA or amino acid sequence). If 
genes arise independently they by definition do not share 
ancestry nor do they share amino acid sequence identity. The 
scientific research community has developed strategies to 
assay a wide range of known protein functions over the 
years, it may follow that the screening of novel proteins of 
unknown function using familiar assay systems will yield a 
surprising number of experimentally determined gene func- 
tions. While this explanation may partially explain the rea- 
son we are accumulating more and more genes of unknown 
function in our databases, we remain highly ignorant as to 
the frequency of NODs in nature. 

Massively parallel technologies have been developed, 
such as microfluidics and DNA and protein microarrays, 



1875-3973/12 



2012 Bentham Open 



Improving Soluble Expression and Applications in Protein Functional Analysis 



Current Chemical Genomics, 2012, Volume 6 9 



which present important vehicles to partially enable the 
large-scale characterization of gene/protein function [6-12]. 
Our ability to determine the function of genes places strong 
demands on a variety of disciplines related to recombinant 
protein technologies. The large-scale characterization of pro- 
tein function requires very efficient recombinant proteins 
production in a high-throughput environment and the neces- 
sary automation to perform high-throughput functional 
screens [13, 14]. Likewise, complementary technologies that 
broaden the use of recombinant proteins such as labeling 
methods, sub-cellular localization determination, enzymatic 
activity and substrate specificity will also need to be devel- 
oped and advanced if we are to make significant progress. 

Among the numerous challenges associated with large- 
scale functional characterization of proteins is the choice of 
expression systems that are to be employed. Given the fact 
that several systems offer some discrete advantage, in an 
ideal world, one would employ many platforms. For practi- 
cal reasons researchers are forced to make difficult decisions 
regarding which platform provides the greatest overall utility 
for the objectives in question. Among the variety of tools 
being developed that show promise of enabling the func- 
tional characterization of protein function, the HaloTag tech- 
nology developed by scientists at Promega (Madison, WI) is 
notable [15, 16]. Here we provide an overview of functional 
assays and experience we have developed in conjunction 
with the HaloTag technology. 

We have used the HaloTag technology for a number of 
functional studies, including protein microarrays, affinity 
purification of DNA-protein, protein-protein interactions, 
and protein complex identification [7, 17]. The HaloTag is a 
modified haloalkane dehalogenase designed to covalently 
bind a series of chloroalkane derivatives such as fluoro- 
phore-labeled ligands (Promega). We have observed im- 
proved solubility of fusion proteins using this system, com- 
parable to that achieved by the best solubilization fusion 
partner, the maltose-binding protein MBP [18]. The HaloTag 
vector (Promega) adopted a Flexi cloning system that uses 
traditional restriction site cloning methods. We found this 
cloning method to be inadequate for high-throughput cloning 
of genes, and have adapted the cloning platform for com- 
patibility with Gateway and Ligation Independent Cloning 
(LIC) procedures [19-22]. We have used these vectors in a 
number of studies including the expression and purification 
of proteins derived from Influenza virus H1N1, Y. pestis, S. 
pneumoniae and B. mallei. Genes were expressed using sev- 
eral expression systems including E. coli, a cell-free (wheat 
germ) system and mammalian cells. The HaloTag supports 
development of functional assays, such as fluorescence po- 
larization, FRET, on-chip purification in protein microarrays 
and also allows monitoring sub-cellular protein localization. 
The rapid covalent attachment of the HaloTag to its specific 
ligand is a critical feature that separates the HaloTag from 
any other tags that use reversible interactions [23]. The high 
affinity covalent interaction is extremely rapid and allows 
binding reactions to be carried out in minutes. This has prov- 
en advantageous in that we observe a dramatic reduction in 
the background, non-specific binding events that reduce sig- 
nal to noise assay ratios [16, 24]. 



HISTORICAL DEVELOPMENT OF HALOTAG 

The development of the HaloTag is the result of rational 
engineering of a bacterially encoded haloalkane dehydro- 
genase (DhaA) derived from a Rhodococcus spp [25], carried 
out in the laboratories at Promega [15, 16, 24]. The occur- 
rence of this enzyme is phylogenetically restricted to a small 
number of taxa. The 34 kDa protein cleaves at the carbon- 
halogen bond of a number of aliphatic halogenated com- 
pounds through a mechanism involving a hydrolytic triad 
within the active site of the enzyme. During the carbon- 
halogen cleavage reaction, the enzyme forms a transient co- 
valent complex with its substrate, leading to the nucleophilic 
displacement of the terminal halogen using Asp 106. The 
complex is hydrolyzed in a reaction involving His273 
through the activation of a water molecule. In order to stabi- 
lize this intermediate, the His273 residue was replaced with a 
Phe residue that occupies a similar volume in space but does 
not have the potential as a base to carry out the hydrolysis 
reaction. Therefore, the covalently linked substrate remains 
trapped in the active site of the enzyme. Mutagenesis of 
some residues was made to increase the accessibility of the 
ligand for the active site and for others to enhance solubility 
and additional characteristics in the final HaloTag protein. 
These efforts have resulted in the provision of a novel and 
robust system for conducting recombinant protein studies in 
a wide variety of formats. 

POTENTIAL ADVANTAGES OF COVALENT LINK- 
AGE 

Research objectives focused on high throughput func- 
tional characterization of proteins have led to the develop- 
ment of a variety of novel methodological strategies and 
technologies. Many of these strategies rely on the immobili- 
zation of recombinant proteins to matrices with a very large 
surface area [9, 11, 26-31]. In this regard many of the bio- 
chemistry or physical interaction studies being carried out 
are associated with unique challenges presented by large- 
scale screening and the immobilization to solid substrates 
that in some cases may generate significant non-specific 
binding and high levels of background in the assays per- 
formed. The HaloTag technology offers some discrete and 
potentially important advantages to address these two issues 
based on the covalent and very high affinity interaction be- 
tween the HaloTag and its ligand [15, 16, 32]. The covalent 
linkage of the HaloTag to immobilized surfaces ensures that 
high stringency washes may be performed without concern 
of removing the immobilized proteins [33]. Perhaps equally 
important is the high affinity interaction of the HaloTag and 
its ligand. The on rate of the interaction at typical protein 
ligand concentrations drives the reaction to near completion 
very rapidly. In this regard, the functional assays performed 
with HaloTag recombinant proteins can be conducted in a 
reduced time frame; thereby decreasing the mass-action, 
non-specific background signals that may be facilitated by 
longer incubation times. 

ADAPTATION OF HALOTAG TO GATEWAY EX- 
PRESSION VECTORS 

One of the essential elements for high-throughput protein 
production and functional screening is the selection of an 
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expression vector with a specific fusion tag. The trends in 
high-throughput recombinant protein expression indicate that 
no single expression system is ideal for all target proteins. 
Therefore, many expression pipelines include multiple ex- 
pression vectors which are used in parallel to increase the 
overall success rate of recovering soluble proteins. However, 
in order to use multiple expression vectors, efficient cloning 
methods such as the Gateway recombination cloning method 
are required [19, 20]. Although the use of multiple expres- 
sion vectors increases the number of recovered soluble target 
proteins, for practical purposes, the, use of expression vec- 
tors is often limited to one or a few vectors in most high 
throughput gene cloning pipelines. Therefore, an ideal ex- 
pression vector possesses excellent fusion tag properties 
(solubility and purification efficiency) and a high throughput 
cloning procedure amenable to automation. We have at- 
tempted to strike this ideal by constructing a series of ex- 
pression vectors that merge the qualities associated with the 
HaloTag to the ease and efficiency associated with the either 
LIC or Gateway cloning methods. The Gateway compatible 
expression vector has the added advantage that it allows in- 
vestigators to utilize existing entry clone sets which have 
been produced and made available through public reposito- 
ries (http://www.beiresources.org) [14, 18]. We have evalu- 
ated the outcomes of a number of protein expression trials 
using these chimeric vectors. 

The vectors, pFN18A, pFN19A, pFC20A and pFC14A 
were obtained from Promega for expression of various target 
proteins in E. coli, cell-free lysates and mammalian expres- 
sion systems (Fig. 1). We modified these vectors in a variety 
of ways. Each of the modified vectors contains the E. coli 
ccdB cassette which encodes a product that is toxic to E. coli 
[34]. We adapted the Gateway cloning method to prepare 
clones which were easier to use than existing entry clones. 
The expression vector, pGW-nHalo, is based on the vector 
pFN18A which replaced the barnase with the attR recombi- 
nation cloning sites and ccdB cassette. We also constructed 
pHis-cHalo another Gateway compatible vector based on 



pFN20A and T02 (pHis) vectors [14] that contains an N- 
terminal His-tag and a C-terminal HaloTag. We also con- 
structed a ligation independent cloning vector with a C- 
terminal HaloTag (pLIC-Halo) based on the pMCSG7 vector 
backbone [35] and consists of an N-terminal His-tag and a 
C-terminal HaloTag. The His-tag can be removed by throm- 
bin cleavage after purification [21, 22]. The addition of the 
His-tag in the vectors enables the use of the His-tag for puri- 
fication, when down-stream applications of the purified pro- 
tein require the HaloTag for fluorophore labeling. 

COMPARISON OF EXPRESSION VECTORS 

Success rates in recovering solubly expressed target pro- 
teins using the various HaloTag vectors (Fig. 1) were evalu- 
ated in E. coli, cell-free expression system and mammalian 
cells and compared with previous expression studies that 
employed fusion proteins such as: His-tag, MBP, DsbA and 
GST (Table 1 and Supplementary Table 1) [14, 18]. As de- 
picted in Fig. (1), each HaloTag vector has specific charac- 
teristics such as the location of the HaloTag, drug resistance 
markers and cloning strategies. Four of those vectors, 
pFN19A, pFC20A, pHis-cHalo and pLIC-Halo all contain 
dual promoters, T7 and SP6, which express proteins in either 
E. coli or wheat germ in vitro expression systems. As a con- 
trast, vectors, pFN18A and pGW-nHalo, allow the expres- 
sion of proteins in E. coli expression system with the T7 
promoter alone. 

The His-tag expression vector, T02 (pHis) yielded solu- 
ble proteins in 43.2 % of attempts when targeting the com- 
plete set of ORFs encoded in S. pneumoniae TIGR4 [14]. A 
second study focused on expression of proteases resulted in 
similar outcomes with 39.6% success [18]. The success fre- 
quencies were below 50% for each of the vectors tested in 
these studies except cases employing the MBP-tag or the 
HaloTag. The pMBP produced soluble proteins for more 
than 70% of target proteins. Both the pFN19A, and the 
pGW-nHalo, which are N-terminal HaloTag vectors, pro- 
duced soluble proteins in E. coli at very similar frequencies. 
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Fig. (1). The HaloTag expression vectors used for protein expression and functional studies. 

Various expression vector systems were used for HaloTag recombinant protein expression. The vectors, pFN18A, pFN19A, pFC20A and 
pFC14A were obtained from Promega for expression of target proteins in E. coli, cell-free and mammalian expression systems. In order to 
modify applicable cloning methods, we further modified these vectors to contain the ccdB cassette for positive selection of cloned plasmids. 
The expression vector, pGW-nHalo is based on pFN18A and ccdB cassette was incorporated into the vector. The pHis-cHalo was based on 
pFN20A and T02 (pHis) vector [14]. The expression vector, pLIC-Halo was also based on pFC20A and LIC cloning site was incorporated 
with ccdB cassette. These vectors were used for expression and solubility studies of proteins in S. pneumoniae TIGR4, Y. pestis KIM 10, B. 
mallei ATCC 23344 and H1N, and for functional analysis. 
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Table 1. Comparison of Success Rates of Soluble Expression of Recombinant Proteins which Derived from Various Expression 
Vectors 



Fusion tag 


Expression 


Solubility 


HaloTag Vector 


Expression 


Solubility 


pHis: His-tag 1 


59.5% 


43.2% 


pFN18A 3 (23) 


73.9% 


56.5% 


pHis: His-tag 2 


54.0% 


39.6% 


pGW-nHalo 3 (74) 


82.4% 


70.3% 


pMBP: ASP-MBP 2 


72.7% 


70.1% 


pFN19A 3 (52) 


75.0% 


69.2% 


pSP-MBP: MBP 2 


64.7% 


43.9% 


pFC20 3 (67) 


67.2% 


61.2% 


pDsbA: DsbA 2 


58.8% 


47.6% 


pFC14A 4 (10) 


80.0% 


N/A 


pEXP7: GST 2 


49.7% 


42.8% 


HaloTag (average) 


75.2% 


65.7% 



Genome-wide protein expression and purification of S. pneumoniae proteome success rates were calculated based on efforts applied to 1529 destination 
clones [14]. 2 Putative proteases derived from S. pneumoniae TIGR4, B. anthracis Ames and Y. pestis KIM. Success rates were calculated based on 187 destina- 
tion clones [18]. Combination of protein sets (23-80 clones) of DNA binding proteins, Type 3 secretion system, Type 6 secretion system and/or randomly 
selected proteins in E. coli, S, pneumoniae TIGR4, Y. pestis B. anthracis, Ames, and Burkholderia mallei ATCC23344. 4 10 H1N1 proteins were used in the 
study and solubility information is not available. The numbers in parentheses are the number of clones in the study. 



that the HaloTag enhances expression and solubility of target 
proteins to levels comparable to that of the previously de- 
fined "best" solubilization tag, MBP [36, 37]. 

INCREASE SUCCESS RATES OF SOLUBLE EX- 
PRESSION 

In order to characterize proteins of interest, soluble ex- 
pression and purification of proteins are essential. Here, we 
describe two strategies we employed to increase the success 
rate of soluble expression/purification of proteins of interest. 
First, a complementary pair of expression vectors containing 
the same fusion tag (C-terminal and N-terminal) increases 
the overall recovery of soluble proteins. We have used the 
expression vectors, pFN19A and pFC20A for this purpose to 
express a group of E. coli proteins (Fig. 2). Second, we eval- 
uated the success rate of traditional column-based purifica- 
tion procedures to in situ purification and determined that the 
latter increased overall success and yield of purified proteins 
(Fig. 3). 



LeuC LeuD HisF HisH RpoA RpoB GyrA GyrB 
NC NC NC NC NC NC NC NC 




Our efforts pertaining to the construction of a vector (pHis- 
cHalo) containing the Gateway attR cloning sites and a C- 
terminal HaloTag was not generally useful for protein ex- 
pression for reasons that remain unclear, while pGW-nHalo, 
Gateway compatible vector with an N-terminal HaloTag, 
displayed excellent expression and solubility of target pro- 
teins, similar to outcomes obtained with pFN19A that also 
contains an N-terminal HaloTag. Influenza virus (H1N1) 
proteins were expressed using pFC14A, which contains the 
CMV promoter and a C-terminal HaloTag, and 8 proteins 
from this virus were well expressed in HEK293T. These 
same proteins were expressed in truncated form when using 
the E. coli expression system. Although target proteins for 
the expression attempts are not identical and therefore not 
directly comparable, the proteins in attempts using HaloTag 
vectors contain a randomly selected set and difficult mem- 
brane localized protein sets such as type III and type VI se- 
cretion systems. Overall, the body of experience using Halo- 
Tag is now large enough to enable comparison to overall 
outcomes associated with other vector systems and conclude 



Fig. (2). Expression of E. coli proteins for protein-protein interactions. The HaloTag recombinant proteins were visualized with TMR ligand. 
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Fig. (3). A comparison of 20 proteins derived from S. pneumoniae using different expression and purification schemes. Manual in situ purifi- 
cation of E. coli, in vitro expressed proteins and column purification were compared. The purified proteins were visualized using rabbit anti- 
HaloTag antibody followed by a goat anti-rabbit antibody conjugated to the dye Alexa555 (upper). The purity of the proteins recovered from 
each strategy was examined by comparing the ratio of signal generated by an mti-E. coli antibody to that of anti-HaloTag antibody. Each 
well represents purification of the following protein, 1: SP 0291, 2: SP 0308, 3: SP 0321, 4: SP 0435, 5: SP 0604, 6: SP0845, 7: 
SP 0954, 8: SP 0979, 9: SP1102, 10: SP 1504, 11: SP 1572, 12: SP1631, 13: SP 1650, 14: SP1671, 15: SP 1699, 16: SP 1752, 17: 
SPJ802, 18: SPJ925, 19: SP_1959, and 20: SP_2209. 



We used pFN19A (N-terminal HaloTag) and pFC20A 
(C-terminal HaloTag) to increase the overall recovery of 
soluble proteins of E. coli proteins of interest, LeuC, LeuD, 
HisF, HisH, RpoA, RpoB, GyrA and GyrB. For these studies 
we used two E. coli expression strains to enhance the recov- 
ery of soluble proteins. BL21(DE3)/pMagic, an E. coli B 
strain derivative containing the pMagic plasmid that encodes 
tRNAs that are rare in E. coli and KRX/pGro7, a K-12 de- 
rivative containing a plasmid expressing the chaperone com- 
plex, GroEL/ES [38]. The use of pFN19A and pFC20A vec- 
tors displayed similar outcomes in most cases but also dis- 
played complementary outcomes in several instances as 
shown in Fig. (2). For example, LeuD and GyrA displayed 
higher soluble expression using pFC20A while almost no 
soluble protein was recovered with pFN19A. In contrast, 
HisF, HisH and RpoB were recovered as soluble proteins 
only pFN19A. Similarly, HisF and GyrB were expressed in 
soluble form at higher levels in vector pFN19A in 
KRX/pGro7 while soluble LeuC was expressed at higher 
levels using BL21(DE3)/pMagic. Soluble HisF was obtained 
solely with N-terminal HaloTag vector in KRX/pGro7. The 
combination of expression vectors, pF19A and pFC20A and 
two expression strains allowed the recovery of all targets in 
soluble form with adequate yield and purity. 

As part of our ongoing efforts to compare a variety of 
strategies for recombinant protein expression and purifica- 
tion to determine whether any provide a means for achieving 
higher overall success frequencies in the recovery of soluble 
recombinant protein. We exploited the covalent linkage of 
HaloTag recombinant proteins as a means of performing 
direct protein purification from crude E. coli lysates or from 
in vitro expression extracts using HaloLink microarray slides 
(Fig. 3). We randomly selected 20 ORFs encoded in the ge- 



nome of S. pneumoniae and cloned these sequences into 
pFC20A. Recombinant proteins were either expressed in the 
BL21(DE3)/pMagic strain or by in vitro expression using the 
TnT® SP6 Coupled Wheat Germ Extract System (Promega). 
The over-expressed proteins derived from BL21(DE3)/ 
pMagic were purified using either HaloLink resin resulting 
in recovery of 75% of targets as soluble protein. When these 
proteins were expressed and purified using direct purifica- 
tion on HaloLink glass slides we recovered 100% of the tar- 
get proteins in soluble form. Finally, when using in vitro 
transcription and translation systems followed by direct puri- 
fication using HaloLink slides we recovered 85% of the tar- 
get proteins in soluble form. Conclusions drawn from these 
studies must be taken with caution, however it appears that 
direct purification of recombinant proteins whether ex- 
pressed in vitro or in E. coli may be more successful than 
traditional column-based purification schemes. The average 
purity of recovered proteins over-expressed in the E. coli 
BL21(DE3)/pMagic strain and purified using in situ purifica- 
tion is estimated to be more than 90% which is adequate for 
a variety of downstream applications. 

USE OF HALOTAG RECOMBINANT PROTEINS TO 
IDENTIFY PROTEIN-PROTEIN INTERACTIONS 

As we learn more about the cellular functions of proteins 
we see that few proteins operate in isolation of other macro- 
molecules, particularly other proteins. The two-hybrid meth- 
od and immunoprecipitation "pull down" experiments have 
contributed to our growing perception that proteins often 
function via physical interaction with one or more proteins 
[12, 39]. Our knowledge of numerous binary interactions 
between proteins and multi-protein complexes e.g. RNA and 
DNA polymerase, ribosomal subunits etc is extensive for 
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these examples but fundamentally lacking in others. Inde- 
pendent methods are needed to validate and discover protein- 
protein interactions [12]. We have used the HaloTag tech- 
nology in a number of formats as a means of identifying or 
validating a number of binary protein interactions and also to 
identify constituents of multi-protein complexes [7, 40]. 

Protein interactions that occur within the Y. pestis Type 3 
secretion system (T3SS) were identified using a protein ar- 
ray-based method in which the labeled HaloTag recombinant 
proteins were used as prey to detect binary protein interac- 
tions with immobilized bait proteins. The T3SS apparatus, 
also known as an injectisome, functions to directly inject 
effector proteins expressed by the bacterium into its mam- 
malian host during infection [41-45]. To carry out this inter- 
rogation we cloned the bait proteins (T3SS) into pMBP (His- 
MBP tag) previously reported in [18], that were immobilized 
to a Cu 2+ coated microarray slide surface (Fig. 4) [46]. The 
immobilized bait proteins were challenged with specific 
HaloTag prey proteins which were derived from pFN18A to 
establish the specificity of their interactions using indirect 
detection via an anti-HaloTag antibody or Biotin labeled 
HaloTag followed by fluorescently labeled streptavidin. The 
pFN18A vector was used for this study because the HaloTag 
recombinant T3SS proteins derived from pFN19A and 
pFC20A were partially degraded when expressed in E. coli. 
These experiments are particularly challenging since the 
T3SS is a multi-protein complex involving a number of 
membrane localized components that are difficult to express 
as soluble proteins. An example of the results achieved using 
this strategy is shown in Fig. (4). In this instance, when Hal- 



oTag prey protein Y0049 (LcrG) is used to interrogate the 
protein microarray it interacts specifically with Y0050 
(LcrV), an interaction that has been reported previously us- 
ing independent methods for determining the interaction of 
these proteins [47-51]. 

We evaluated the use of HaloTag in a more challenging 
goal to capture the subunits of multi-protein complexes. We 
selected a well-characterized multi-protein complex, RNA 
polymerase from E. coli to examine the pull down scheme 
wherein one suspected member of a protein complex is fused 
to HaloTag. Based on the work of several studies it is known 
that RpoA forms direct contacts with itself, AceE, RplA, 
RpoC, NusA and RpoB, whereas indirect linkages within the 
complex include the additional proteins TufA and Tig [52- 
58]. We cloned and over-expressed the RpoA subunit as an 
N-terminal HaloTag (pFN19A) fusion protein in E. coli, 
BL21(DE3)/pMagic. The assumption made in this experi- 
mental procedure is that the fusion protein will retain its abil- 
ity to interact with the other proteins in the complex with 
relatively similar efficiency as the endogenously expressed 
RpoA. The RpoA in the pFN19A vector was over-expressed 
in 5 mL E. coli culture. The RpoA derived from the whole 
cell lysate was immobilized onto HaloLink resin and washed 
extensively to eliminate non-specific interacting proteins. 
Following recovery of the fusion protein, several protein 
bands were recovered (Fig. 5A). These bands were cut from 
the gel and subjected to MALDI-TOF/TOF-MS to identify 
those proteins present in the RpoA complex. Our results il- 
lustrate the power of the approach as all of the known mem- 
bers of the protein complex were recovered as shown in Fig. 
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Fig. (4). Identification of T3SS Interactions in situ Using Protein Microarrays. (A) Scheme used to identify protein-protein interaction using 
HaloTag recombinant proteins and His-MBP tagged recombinant proteins. (B) Immobilized His-MBP tagged T3SSs on Cu 2+ /ID A/PEG were 
visualized with anti-His-tag antibody (left) and interacting proteins with IcrG were detected by the rabbit anti-Halo antibody and goat anti- 
rabbit antibody labeled with Alexa555 (right). 
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Fig. (5). Multi-protein Complex Discovery. (A) The pull-down 
study using HaloTag recombinant RpoA. M: molecular weight 
marker; 1: unbound protein; 2: wash; 3: eluted protein after TEV 
protease cleavage, 4: eluted protein after removal of TEV protease; 
5: concentrated protein sample. The arrows indicate the position of 
HaloTag recombinant (lane 1) and cleaved (lane 3) RpoA. (B) In- 
teraction map of E. coli proteins which identified by MALDI- 
MS/MS from the E. coli RpoA pull-down study. 

5B. This platform can be easily adapted to high throughput 
platform such as a 96-well format, thus allowing AP/MS to 
be performed in a high throughput manner. 

USE OF HALOTAG RECOMBINANT PROTEINS TO 
IDENTIFY PROTEIN-DNA INTERACTIONS 

The interest in DNA protein interactions, particularly of 
transcriptional regulatory proteins has been significant for 
nearly three decades now. There are a variety of methods for 
studying these interactions but the majority of these are re- 
fractory to high throughput characterization. We have evalu- 
ated a number of methods including gel mobility shift as- 
says, fluorescence polarization, ChlP-chip and ChlP-Seq 
analysis and others [8, 10, 59-70]. Each approach has spe- 



cific advantages and disadvantages with respect to ease, re- 
producibility, sensitivity and specificity. The proteomic pro- 
filing of transcription factors is often hampered by the low- 
level expression of these proteins preventing their visualiza- 
tion on 2DE MS/MS based experiments or LC/MS/MS stud- 
ies. We enriched these proteins from crude ly sates derived 
from Y. pestis by passing the lysate over a DNA cellulose 
column. The eluted proteins were indeed strongly enriched 
for transcription factors and other nucleic acid binding pro- 
teins. Among the list of recovered proteins was a set of 16 
hypothetical proteins. We wished to establish whether these 
genes of unknown function represented a new class of tran- 
scription factors or nucleic acid binding proteins. 

We developed an approach to evaluate the DNA binding 
activity and specificity of these proteins as described below. 
In this scheme, we cloned each of the putative transcription 
factors into pFN19A N-terminal HaloTag expression vector. 
The recombinant proteins were expressed in BL21(DE3)/ 
pMagic. These proteins were then immobilized onto Halo- 
Link slides. Among the 16 7. pestis target proteins 12 were 
expressed in E. coli and 10 of these were recovered as solu- 
ble protein. Nine of the soluble proteins were effectively 
purified by direct purification on HaloLink slides (Fig. 6). 
We next fluorescently labeled sheared Y. pestis genomic 
DNA with Cy5. The labeled genomic DNAs were then 
mixed with each immobilized HaloTag fusion protein in ei- 
ther low or high salt buffer to allow DNA-protein interac- 
tions to occur. After appropriate washing of the slide surface, 
the bound genomic DNA is recovered from the array and 
used as a hybridization probe of a second DNA oligonucleo- 
tide tiling microarray. This microarray represents the entire 
Y. pestis genome as a series of overlapping 60-mer oligonu- 
cleotides alternately covering each strand of DNA and al- 
lows the approximation and partial identification of the spe- 
cific DNA sequences bound by the transcription factor. This 
straight-forward method is amenable to moderate throughput 
but can be envisioned as a means of characterizing all anno- 
tated transcription factors encoded in a genome of interest. 
While our experience with this strategy is still limited it is 
anticipated that the method success will be linked to the af- 
finity of the protein for its cognate DNA sequence motifs 
and further by our ability to capture growth conditions that 
permit expression of transcription factors such that they are 
activated for specific DNA binding such as is expected for 
the case of two-component regulators that require phos- 
phorylation for DNA binding activity. 

CONCLUSIONS 

We have adapted the HaloTag technology to current pro- 
tein production platforms and examined the enhancement of 
soluble expression of the proteins of interest. We also exam- 
ined the use of the HaloTag to high throughput functional 
studies such as protein-protein interactions and protein-DNA 
interactions. Several vectors containing HaloTag were made 
compatible with high throughput cloning strategies and ex- 
amined for their efficiency in expressing soluble protein. The 
N-terminal HaloTag Gateway vector (pGW-nHalo) showed 
that the HaloTag recombinant proteins were solubly ex- 
pressed with a high success rate and can be used for high 
throughput cloning using existing entry clone sets. Soluble 
expression attempts of proteins of interest in E. coli, in vitro 
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Fig. (6). Protein Microarray and DNA Tiling Microarray to Identify Protein-DNA Interactions, of hypothetical proteins in Y. pestis KIM. 
Binding of Cy5 labeled sheared genomic DNA onto a 16-pad protein array (A) in low salt binding buffer (25 mM KC1) and (B) in high salt 
buffer (150 mM KC1). (C) Array image after recovery of the bound DNAs. (D) DNA tiling microarray with the recovered DNA from protein 
array. The color represents the amount of DNA bound to the proteins. The color scale represents the strongest signals as red followed by 
orange, yellow, green, blue and black. 



and mammalian expression systems were conducted using 
various HaloTag vectors and the results demonstrated overall 
high success rates. A combination of N-terminal and C- 
terminal HaloTag vectors increases overall success rate of 
soluble protein recovery. We have employed the HaloTag 
technology in other contexts using protein microarrays for 
high throughput assay for anti-sera screening and other pro- 
tein functional analysis. In the protein array schemes, the 
HaloTag recombinant proteins were successfully used as 
prey proteins for identification of protein-protein interactions 
in Y. pestis T3SS with other fusion tagged recombinant pro- 
teins, and as bait proteins to identify DNA binding activity 
of hypothetical proteins. The HaloTag was successfully used 
for pull-down assays involving E. coli RpoA as part of a 
multi-protein complex. While we describe here only a lim- 
ited number of applications of the HaloTag technology, 
many more strategies are enabled by this versatile technol- 
ogy. In these early days of the post-genomic era, HaloTag 
and other technologies will be important vehicles for better 
understanding the breadth of protein functions encoded by 
the awe inspiring number of unique proteins encoded on our 
planet. 
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